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Abstract 

We prove that for any decision tree calculating a boolean 
function / : {-1, 1}" -> {-1, 1}, 

71 

i=i 

where Si is the probability that the ith input variable is 
read and Infi(/) is the influence of the ith variable on /. 
The variance, influence and probability are taken with re- 
spect to an arbitrary product measure on { — 1, 1}™. It fol- 
lows that the minimum depth of a decision tree calculat- 
ing a given balanced function is at least the reciprocal of 
the largest influence of any input variable. Likewise, any 
balanced boolean function with a decision tree of depth 
d has a variable with influence at least \- The only pre- 
vious nontrivial lower bound known was fl(d2~ d ). Our 
inequality has many generalizations, allowing us to prove 
influence lower bounds for randomized decision trees, 
decision trees on arbitrary product probability spaces, 
and decision trees with non-boolean outputs. As an ap- 
plication of our results we give a very easy proof that 
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the randomized query complexity of nontrivial monotone 
graph properties is at least f2(v 4 / 3 /p 1 ^ 3 ), where v is the 
number of vertices and p < | is the critical thresh- 
old probability. This supersedes the milestone fl(v 4 ^ 3 ) 
bound of Hajnal |13| and is sometimes superior to the 
best known lower bounds of Chakrabarti-Khot |9| and 
Friedgut-Kahn- Wigderson 1 1 1 1 . 

1 Introduction 

1.1 Motivation. 

This paper lies at the intersection of two topics within the 
theory of boolean functions. 

The first topic is decision tree complexity. A deter- 
ministic decision tree (DDT) for a boolean function / : 
{—1,1}™ — * {—1,1} is a deterministic adaptive strategy 
for reading variables so as to determine the value of / 
(a formal definition appears in Section l3~Tl . The cost of 
a DDT on a given input is simply the number of input 
variables that it reads, and the DDT complexity of a func- 
tion /, D(f), is the minimum over all DDT's for / of the 
maximum cost of any input. A randomized decision tree 
(RDT) for / is a probability distribution over DDTs for 
/; such trees are sometimes known as zero-error random- 
ized decision trees. The RDT complexity of /, R(f), is 
the minimum over all RDT's for / of the maximum ex- 
pected cost of any input. Decision tree complexity has 
been studied in theoretical computer science for over 30 
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years and there is now a significant body of research on 
the subject (for a survey, see e.g., [8|). 

The second topic is variable influences, introduced to 
theoretical computer science by Ben-Or and Linial in 
1985 |2|. Any n-variate boolean function / has an as- 
sociated influence vector (Infi(/), . . . ,Inf„(/)) where 
Infi(/) measures the extent to which the value of / de- 
pends on variable i (a precise definition appears in Sec- 
tion ll .21 . A number of papers have dealt with properties 
of this vector and its relation to other properties of boolean 
functions; perhaps the best known work along these lines 
is that of Kahn, Kalai and Linial HI 41 ("KKL") concern- 
ing the maximum influence Inf max (/) = max{Infi(/) : 
i G [n]}- Their result implies, for example, that 
Inf max (/) = fi(^p-) for any near-balanced boolean 
function / (where we say that / is near-balanced if both 
|/- 1 (l)|/2- and |/-i(-l)|/2» are 0(1)). 

The question that originally motivated this paper was: 
what is the best lower bound on Inf max (/) that holds for 
all near-balanced boolean functions / satisfying D{f) < 
dl It is easy to see that such a function / depends on at 
most 2 d of its variables and therefore the KKL result im- 
plies Inf max (/) > fi(-^r); prior to this work, this was the 
best lower bound known. Our main inequality for boolean 
functions, Theorem ll.il implies a (tight) lower bound of 
Inf max (/) > fi(4) for any near-balanced function / sat- 
isfying D(f) < d. 

In fact, Theorem 11.11 provides a lower bound on a 
weighted average of the influence vector, where Infi(/) 
is weighted by the probability that a DDT for / queries 
Xi when x is a randomly chosen input. This lets us ex- 
tend our lower bound on Inf max (/) to functions with 
R{f) < d and even to functions with A(/) < d, where 
A(/) denotes the expected number of queries made by the 
best DDT for / on a random input (again, see Section lT"2l 
for precise definitions). 

1.2 The main theorem for boolean func- 
tions. 

Our main theorem holds in a very general setting, that 
of functions from product probability spaces into metric 
spaces. However the case of greatest interest to us is much 
simpler. Fix some p E (0, 1) and let {—1,1}™^ denote 
the discrete cube endowed with the p-biased product mea- 



sure, H {p ){x) = p\^- x ^\ (1 -p)l{^*=-l}l. When we 
write simply { — 1, 1}™ the uniform measure case p = ~ 
is implied. Our main interest is in boolean functions 
/ : {—1,1}^ — > { — 1,1}, and in this section we will 
describe our main theorem in this case. 
First we recall a few definitions. We have 

Var[/] - E[/ 2 ] - E[/] 2 = 4Pr[/ = 1] Pr[/ = -1]. 

This measures the "balance" of /; if / is equally likely to 
be 1 as —1, then Var[/] = 1. We also make the following 
definition for the influence of the ith coordinate on /: 

Infi(/) = 2 Pr [f(x) + /(*«)], 

where x is drawn from {—1, an d is formed by 
re randomizing the ith coordinate of x. Note that our defi- 
nition agrees with the one introduced in 1 2 1 in the uniform 
measure case p = -|, which was Infj[/] = Pr[f(x) ^ 
f(x®i)]. (Our definition differs from the p-biased notion 
of influences used in, e.g., 1121 by a factor of Ap (1 — p); 
we prefer rerandomizing the ith coordinate to flipping 
it, since this makes sense in more general product prob- 
ability spaces which we will consider later.) We call 
Inf (/) := Y^i=i Inf i (/) the total influence of /. 

Finally, since the notion of influences involves random- 
izing over the input domain, it makes sense to introduce 
a notion of randomizing over inputs for decision trees. 
Let T be a DDT computing a function / : {—1,1}^ — > 
{-1, 1}. We write 



5i(T) — Pr [T queries x%}, 



and 



ACT) = V 6i(T) = E [# coords T queries on x 

We also let A(/) denote the minimum of A(T) over 
all DDTs T computing / : {-1,1}? x -► {-1,1}. It 
is easy to see that this is equivalent to minimizing over 
all RDTs computing /; hence A(/) < R(f) for all 
p. Also note that A(/) can be upper-bounded in terms 
of the size (number of leaves) of the smallest DDT T 
for /: 1 19; shows A(/) < log 2 (size(T))/ff(p), where 
H{p) = —p\og 2 p — (1 — p)log 2 (l — p) is the binary 
entropy of p. 

We may now state our main theorem in the case of func- 
lions /:{-!,!}£,)-► {-1,1}: 
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Theorem 1.1 Let f : {-1, 1}? } 
a DDT computing f. Then 



{ — 1,1} and let T be 



Var[/]<J>(T) Inf.Cf). 

i=l 

As an immediate corollary we obtain the lower bound on 
Inf max (/) mentioned in Section fTTTI 

Corollary 1.2 For every f : {— — > { — 1,1} we 
have 

A(/)> Var ^ . 
U j " Inf max (/) 

Proof: Let T be a DDT computing /. From Theorem ll.il 



Var[/]<£>(T) Inf^/) 

< Inf raax (/) S i ( T ) = Inf .«(/) • A(T) . □ 

Some brief comments on our main theorem: 

• It is linear in the <5i(T)'s. Hence if we allow an RDT 
T for / and make the natural definition of 5i (T), the 
result still holds by averaging over the distribution 

r. 



• It can be sharp; see Section l331 for cases of equality. 

• Other corollaries along the lines of Corollarv ll.2l fol- 
low; for example, if d is an integer > A(/), then the 
sum of the influences of the d most influential vari- 
ables is at least Var[/]. 



• In Section l373l we will give a "two function" version, 
which yields a lower bound for the randomized deci- 
sion tree complexity of approximating /. 

1.2.1 Influence lower bounds — comparison with 
previous work. 

Proving lower bounds on the influences of boolean func- 
tions has had a long history in theoretical computer 
science, starting with the 1985 paper of Ben-Or and 
Linial |2| on collective coin flipping. Ben-Or and Linial 
made the basic observation that if / : { — 1,1}" — > 
{-1, 1} is balanced (i.e., E[/] = 0), then Inf max (/) > 



— . This follows from the edge isoperimetric inequality 
on the discrete cube (see, e.g., [6|); however, it is more 
instructive for us to view it as following from the Efron- 
Stein inequality (10 26 1, 



Var[/]<Inf(/)=^Inf l (/), (1) 

i=l 

which holds in the general p-biased case, and also in the 
much more general setting of / : f2 — > M, where Q, is a n- 
wise product probability space and InL, is defined appro- 
priately for real-valued functions (specifically, with the 
"p2 semimetric" discussed in Section l3~4l . Theorem ll.il 
is immediately seen to improve the Efron-Stein inequality 
in the case of functions / : { — 1, 1}" P ) - * { — 1 ; !}■ 

Ben-Or and Linial constructed a balanced function / : 
{-1, 1}" -» {-1, 1} ("Tribes") satisfying Inf max (/) = 
® an d conjectured that for every balanced function 

/ : { — 1,1}™ — > { — 1,1}, Inf max cannot be smaller. 
There were small improvements on the simple — bound 
(^ by Alon, ^ by Chor and Gereb-Graus; see OH) 
before the famous KKL paper 1 14 1 confirmed the conjec- 
ture. Note that our theorem improves upon KKL when- 
ever / has A(/) = o(nj log n); in particular, whenever / 
has a DDT of size 2< n / lo « ") . 

The KKL result was subsequently generalized by Ta- 
lagrand |27, Theorem 1.5] who proved that for any / : 

{-l, -{-i,i}, 

Var[/] < 0(108 ± f^ f f \ . (2) 

V p(l - P )J ^ log(l/Infi(/)) 

Talagrand's motivation for proving this was that when 
/ : { — 1, 1}™^ — > {—1, 1} is monotone, lower bounds on 
the sum of j's influences imply a "sharp threshold" for 
/, via the Russo-Margulis lemma Ml 61 1211 . Indeed, this 
connection with threshold phenomena is one of the chief 
motivations for studying influences, and it is considered 
an important problem in the theory of boolean functions 
and random graphs to provide general conditions under 
which the total influence is large |7 1. Our main inequality 
provides such a condition: Inf (/) is large if / has a ran- 
domized decision tree T with <5;(T) small for all i. Note 
that when / is a transitive function, this is equivalent to 
the natural condition that A(/) is small. (See Section |2] 
for definitions of monotone and transitive functions, as 
well as further discussion of random graph properties.) 
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In particular, ours seems to be the first quantita- 
tively strong influence lower bound that takes into ac- 
count the "structure" or computational complexity of /. 
We note that previously achievable lower bounds on in- 
fluences in terms of some measure of the complexity 
of / yield quantitatively much weaker results than can 
be obtained from our inequality. For instance, Nisan 
and Szegedy ED showed that if / : {-1,1}" -> 
{ — 1,1} is computed by a polynomial over R of degree 
deg(/), then every coordinate i with nonzero influence 
has Inf»(/) > 2- de s(/). Since D(f) < 0(deg(/) 4 ) 
(by a result of Nisan and Smolensky |8|), our Corol- 
lary ^21 implies that the maximum influence in fact sat- 
isfies Inf max (/) > rj(Var[/]/deg(/) 4 ). As another 
example, suppose / : {—1,1}" — * {—1,1} is approxi- 
mately computed by a polynomial over K of degree 
deg(/) — i.e. there is a polynomial p(x) of degree deg(/) 
such that \p(x) — f(x) \ < 1/3 for all ir. Talagrand's result 
implies that Inf max (/) > exp(-0(lnf (/)/Var[/])). 
Since by |24| we have Inf(/) < 0(deg(/)), one could 
conclude that Inf max (/) > exp(-0(d1g(/)/Var [/])). 
However by contrast, since D(f) < (9(deg(/) 6 ) by (Q, 
our Corollary 11.21 implies that the maximum influence in 
fact satisfies Inf max (/) > fi(Var[/]/deg(/) 6 ). 

2 Randomized decision tree com- 
plexity lower bounds 

In this section we give an application of Theorem 1 1.1 1 to 
the problem of randomized decision tree complexity for 
monotone graph properties. We prove Theorem 1 1.1 1 in a 
more general setting in Section|3] 

2.1 History. 

As mentioned in Section lTTI decision tree complexity has 
been extensively studied for over three decades. Two spe- 
cial classes of functions have played a prominent role in 
these investigations. The first is the class of monotone 
functions, those satisfying f(y) > f(x) whenever y > x 
under the componentwise partial order. The second is the 
class of transitive functions. An automorphism of the n- 
variate boolean function / is a permutation a of [n] satis- 
fying f(xi, ...,!„) = f{x a(1) x CT („)) for all inputs 



x. We say that / is transitive if for each pair i,j 6 [n] 
there is an automorphism of / that sends i to j . For exam- 
ple, Rivest and Vuillemin 1 20 1 proved that for n a prime 
power, any n-variate monotone transitive function / has 

D(f) = n. 

One long studied open question about boolean deci- 
sion tree complexity is the following: how small can 
R(f) be in relation to £>(/)? It is well known that 
R(f) > tt(y/D(f)) for any function /, and this is the 
best general lower bound known. The largest known sep- 
aration is given by the following recursively defined func- 
tion: Let /o be the identity function on a single variable 
and for k > 1, let /& be the function on n = 4 fe vari- 
ables given by (/ fe 1 _ 1 A p k _,) V (/^ A f^), where 
fl_ 1 is the value of fk-i on the ith group of 4 fc ~ 1 vari- 
ables. The function /& is monotone and transitive, and so 
by the above result of Rivest and Vuillemin, D(fi c ) = n. 
Snir 1251 gave an RDT for //. establishing R(f) < n 13 

where (1 = log 2 w 0.753. Saks and Wigderson 

1221 proved that Snir's RDT is optimal for //. and conjec- 
tured that R(f) > ^(D(/) /3 ) for any boolean function; 
this is not even known to hold for all monotone transitive 
functions. 

A well studied subclass of transitive boolean functions 
consists of functions derived from graph properties. A 
property of w-vertex (undirected) graphs is a set of graphs 
on vertex set V = {1, . . . , v} that is invariant under vertex 
relabellings; e.g., the set of graphs on V that are properly 
3-colorable. We restrict attention to properties that are 
non-trivial; i.e., at least one graph has the property and at 
least one graph does not have the property. 

Let (Y) denote the set of 2-elements subsets of V. Each 
graph G on V can be identified with the boolean vector 

x G £ {-1, 1}( 2 ) where a£ -j is 1 if {i,j} G E{G) and 
is —1 otherwise. A graph property V is thus naturally 

identified with a boolean function f-p : {— 1, 1}( 2 ) — ► 
{—1,1} which maps the vector x G to 1 if and only if G 
satisfies V. The invariance of properties under vertex re- 
labellings implies that the associated functions are transi- 
tive. 

There are examples of graph properties on v vertices 
that have deterministic decision trees of depth 0(v); e.g., 
the property of being a "scorpion graph" |4)- However, 
for graph properties that are monotone (those whose asso- 
ciated function is monotone), Rivest and Vuillemin 1201 
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proved a lower bound fl(v 2 ) on DDT complexity. A con- 
jecture made by Yao |28 1 and also attributed to Karp |22| 
is that this il(v 2 ) lower bound extends to RDT complex- 
ity. This is the problem we make progress on in this sec- 
tion. 

Yao observed that an lower bound for RDT 

computation of monotone graph properties is easy to 
prove; this also follows from the general bound R(f) = 
fl(^/D(f)) mentioned earlier. The first improvement 
on this naive bound came a decade later from Yao him- 
self, who proved an Q,(v log 1 ' 12 v) lower bound us- 
ing "graph packing" arguments 1291 . These arguments 
were improved by King 1 15], yielding an fl(v 5 / 4 ) lower 
bound, and by Hajnal 1131 . yielding an f2(v 4 / 3 ) lower 
bound. This lower bound stood for a decade before 
Chakrabarti and Khot 1 9 1 gave a small improvement to 
fi(ij 4/3 log 1/3 v). Both the Hajnal and Chakrabarti-Khot 
bounds have rather long and technical proofs based on 
graph packing. 

Fairly recently, Friedgut, Kahn and Wigderson II II 
proved a general lower bound of a somewhat different 
form. Given a nonconstant monotone boolean function 
/ : {-1,1}?, — * {-1,1}, it is easy to see that E[/] 
is a continuous increasing function of p; therefore there 
is a critical probability p for which E[/] = 0, i.e., 
Var[/] = 1. Friedgut, Kahn and Wigderson proved that 
any nontrivial monotone w-vertex graph property has RDT 



In particular, 



complexity £l(min{ 



min(p.l— p) ' log v 



}) when p is the crit- 



ical probability for /. In fact, they show that A(/) is 
at least this quantity. The FKW bound can improve on 
Chakrabarti-Khot in cases where the critical probability 
is sufficiently close to or 1 . We remark that the proof in 
FKW also uses a graph packing argument. 

2.2 Our R(f) lower bound. 

As a simple consequence of our elementary main in- 
equality Theorem II. H and a recent elementary inequality 
from 1 19 1, we obtain the following: 

Theorem 2.1 Let f : {-1,1}?, -> {-1,1} be a non- 
constant monotone transitive function, where p is the crit- 
ical probability for f (i.e., f is balanced). Write q = 1—p. 
Then 

n 2/3 



R(f) > A(/) > 



(v - I) 4 / 3 



(Wpq) 1 / 3 

if f corresponds to a v-vertex graph property. 

Proof: The inequality we need from 1 19 1 is the following: 



For all p, if / 
then 



{ — 1, 1}?, — > { — 1, 1} is monotone 



Inf(/) <2y/pqA(f). (3) 

Fix p to be the critical probability of / and let T be a 
DDT computing / with expected cost A(/). We ap- 
ply Theorem 11.11 using Var[/] = 1 since p is criti- 
cal and Infj(/) = Inf(/)/n since / is transitive (and 
hence all coordinates have the same influence). This gives 
1 < (Inf (f)/n) ■ A(/). Using <|3} to bound Inf (/) we 
getl < (2 % /pg/n)-(A(/)) 3 / 2 , and this can be rearranged 
to give the desired result. □ 



2.3 Discussion. 

In the case of monotone graph properties, our result al- 
ways improves on Hajnal's f2(v 4 / 3 ) lower bound and can 
be superior to both Chakrabarti-Khot (when min{p, q} 
is small enough) and to FKW (when min{p, q} is large 
enough). It is worth noting that unlike all previous lower 
bounds for monotone graph properties, our proof makes 
no use of graph packing arguments, instead relying only 
on elementary probabilistic arguments. 

Most interestingly, we obtain a result essentially as 
good as the best unconditional bound (Chakrabarti-Khot) 
in the more general context of monotone transitive func- 
tions, not just graph properties. Further, our bound for 
monotone transitive functions is known to be essentially 
tight in the case p = 1/2: in yj, a sequence f n : 
{ — 1,1}™ — > {—1,1} of balanced monotone transitive 
functions is presented with A(/„) < (3(n 2 / 3 logn). Our 
present Theorem ll.ll is used in 1 3 1 to show that A(/„) = 
f2(?i 2 / 3 ), by an argument similar to the proof of Theo- 
rem l2.ll but using an inequality from |23 1 in place of l|3}- 

It is tantalizing that the place where the RDT complex- 
ity of monotone graph properties has been stuck for al- 
most 15 years, w 4 / 3 , is exactly the tight bound for mono- 
tone transitive functions. Perhaps this suggests that in 
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some way the argument of Hajnal is not really using the 
fact that / is a graph property — just that it's transi- 
tive. Indeed, one might wonder the same thing about 
Chakrabarti-Khot, since their u 4 / 3 log 1 / 3 v lower bound 
could also hold for monotone transitive functions — the 
example of 1 3 1 does not rule it out. 

3 The main inequality 

3.1 Decision trees, variation, influences — 
general definitions. 

The proof of Theorem ll.ll is most naturally carried out in 
a significantly more general context than that of functions 
/ : { — 1, l}" p ) — * { — 1, 1}- Specifically, we will consider 
functions 

/ : n— >z 

mapping a product probability space into a metric space. 
In this section we give the necessary definitions. 

Let us begin with the domain. Here we have an n-wise 
product probability space £1 = (X, /i), meaning that that 
the underlying set X is a product set X\ X • • • X X n and the 
measure p is a product probability measure pi X • • • X p n , 
where pi is a probability measure on Xi. For simplicity, 
we assume that X is finite. We write fij for the probability 
space (Xi, pi). We use the notation x <— O to mean that 
x is an element of X randomly selected according to p. 

The range of our functions is a metric space (Z,d). 
(Actually we can allow a "pseudo-metric", meaning we 
may omit the requirement that d(z, z') = =>• z = z'.) 
Useful examples to keep in mind are the following: Z 
any finite set with d(z,z') — l Z7 t Z '> an d, Z = K with 
d(z,z') = \z — z'\. Of course, in the special case of 
boolean-valued functions, Z = { — 1, 1}, all metrics are 
the same up to a constant factor. 

The definitions of decision trees in the context of func- 
tions mapping a product set domain X = X± x • • • x X n 
into a set Z are the obvious ones. Briefly, a DDT will be 
a rooted directed tree T in which each internal node v is 
labelled by a coordinate i v 6 [n] and each leaf is labelled 
by an element of the output set Z. Further, the arcs em- 
anating from each internal node v must be in one-to-one 
correspondence with Xi v . The node labels along every 
root-leaf path are required to be distinct. T computes a 
function fa ■ X — ► Z in the obvious way; we retain the 



notion of the cost of T on input x as the length of the root- 
leaf path T follows on input x. Thus, we have the usual 
notions of D(T) and D(f), and also the (zero-error) ran- 
domized decision tree complexities R(T) andi?(/). With 
the product probability measure p on X, we can also natu- 
rally extend our notions of expected cost from Section fl~2l 
given a DDT T computing /, 

6?(T) = Pr [T queries Xi], 

and A M (T) and A /i (/) are similarly defined. We will 
henceforth drop the superscript p when it is clear from 
context. Note that, as before, we have A(/) < R(f). 

We now give the definitions of variation and influences 
for functions / : fl — > Z. The variation of / : f2 — > Z is 

Vr^[/]= E \d(f(x),f(y))}. 

(x,y)^Uxfl 

To define influences, first let denote the probability 
space given by pairs (x, x^'), where x is chosen from il 
and x^ is formed by rerandomizing the ith coordinate of 
x using pi. Then the influence of the ith coordinate on 
/ : Q — > Z is defined to be 

InfTVH E \d(f(x),f(x^))}. 

We will usually drop the superscripts p and d on Vr and 
Inf i when they are implied by context. Note that if we 
view the functions / : {—1, - * {~ 1, 1} from Sec- 
tion n as mapping into the metric space on { — 1, 1} with 
distance d given by d(z, z') = \z — z'\ = 2 ■ 1 Z ^ Z ', then 
we get agreement in the definitions of Infj(/) and also 
Vr[/] =Var[/]. 

3.2 Theorem and proof. 

We now state and prove our main inequality, which in- 
cludes Theorem ll.il as a special case. 

Theorem 3.1 Let f : — > (Z, d) be a function map- 
ping a finite n-wise product probability space into a met- 
ric space, and let T be a DDT computing f. Then 

n 

Vr[/]<5>(T) Inf.Cf). 

i=l 
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Proof: Let x and y be random inputs chosen indepen- 
dently from ft. Given a subset J C [n] we will write x.jy 
for the hybrid input in X that agrees with x on the co- 
ordinates in J and with y on the coordinates in [n] \ J. 
Let i\,,..,i a denote the sequence of variables queried 
by T on input x (these i's are random variables and s 
is also a random variable). For t > 0, let J[t] — {i r : 
s > r > t}. Finally, let u[t] — Xjuyy. (For example, if 
x = (1, — 1, 1, 1), y = (1, 1, — 1, —1), the tree read x\ fol- 
lowed by X2 and terminates, then u[0] — (1,-1,-1,1), 
u[l] = (1,-1,-1,-1) and u[2] = y.) AllE[-]'sand 
Pr[ • ]'s in what follows are over all the random variables 
just described (i.e., x, y, z's, s, u[-]'s). 
We begin with the simple observation 

Vr[/] =E[d(/(x), /(»))] =E[d(/(«[0]), /(«[«]))], 

which follows because y = u[s] and f(x) = f(u[0]) 
(although x does not necessarily equal u[0]). This latter 
equality is the only place in the proof we use the fact that 
T computes /. 

We next make the obvious step 

s 

E[d(/(u[0]),/(«[*]))] < E[£d(/(u[t- 1]), /(«[*])) 

t=i 

(4) 

which uses the fact that d is a metric. Set i t = for i > s. 
Linearity of expectation and l{t< s } = Y^i=i gi ye 



Taking expectation gives 



E 



d(/(«[t-l]), /(«[*])) l {it=i} = Pr[i t = i] Infi(/) 



Since X)"=i P r [*t = i] = $i(T), an appeal to (0 com- 
pletes the proof. □ 



3.3 Corollaries and two function version. 

In this section we treat some immediate corollaries of 
Theorem 13.11 Certainly the analogue of Corollary 11.21 
holds for Theorem 13. II as do the first and third remarks 
stated after Corollarv ll.2l We now give the promised "two 
function" version. Define 

CoVr [/,<?] 

= E Wf(x),g(y))]- E [d(f(x),g(x))}, 

so in particular CoVr[/, /] = Vr[/]. Thus the following 
theorem generalizes Theorem l3.ll 

Theorem 3.2 Let f, g : fi — > (Z,d) be functions map- 
ping a finite n-wise product probability space into a met- 
ric space, and let T be an RDT computing f. Then 



|CoVr[/, fl ]| <£>(T) Infite). 



E 



S 

[J2d(f(u[t-l]),f(u[t])) 
t=i 



^^E[d(f(u{t-l])J(u{t}))l {H=l} 

t=i t=i 



(5) 



Let X t denote the sequence of values seen by the 
decision tree by time t on input x; that is, X t = 
(xi 1 , . . . , Xi tAs ), where t A s denotes the minimum of t 
and s. Note that X t -\ determines i t . Induction on t G [n] 
easily shows that conditional on X t -\ the variables y and 
(xj : j ii, . . . , j({_i)A«) are independent and retain 
their original distributions. It follows that conditional on 
X t -\ the pair (u[t ~ l],u[f]) has the distribution fiW if 
it = i & [n]. Consequently, for i,t E [n], 



Proof: As usual we can assume by averaging that T is 
a DDT T computing /. Using the same setup as in the 
proof of Theorem l3.ll we have 

CoVr[/, 5 ] = E[d(f(x),g(y))}-E[d(f(u[0}),g(u[0}))} 
- E[d(/HO]), g(u[s}))] - E[d(/HO]), g(u[0]))] 

where in the first equality we used that u[Q] is, in isolation, 
distributed according to fl, and in the second equality we 
used the fact that f(x) — f(u[0]) since T computes / 
(as in the previous proof). Now using the fact that d is a 
metric we get 



CoVr[/,.g] = 

V[d(f(u[0]),g(u[s]))] 



E 



d(f(u[t-l])J(u[t]))l {it -_ 



i} 



= 1 



{ir- 



i} Infi(/). 



E[d(/(«[0]) j5 («[0]))] 
<E[d(g(u[0]),g(u[s}))) 
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and of course this is also true for — CoVr[/, g\. The proof 
now proceeds exactly as before with g in place of /; note 
that from this point on in the previous proof we did not 
use the fact that T computed /. □ 

As mentioned below Corollary 11.21 Theorem 13.21 can 
be used to give a lower bound for the randomized decision 
tree complexity of approximating g. Note that the triangle 
inequality gives 

CoVr[/, 5 ] > Vr[g] - 2E[d(f(x), g(x))] . 

Consequently, Theorem 13 . 21 implies that for every e > 
the expected number of queries required by a randomized 
decision tree to calculate any approximation / of g satis- 
fying E[d(/ (x), g (a;))] <eis at least 

Vr[g] - 2e 
Inf max (g) 

We now describe an alternate version of Theorem l3.2l 
Let / : £1 — > [-1, 1], g : fl — > R, and let T be a random- 
ized decision tree computing /. Then 

n 

\Cav[f,g]\ <^>(T)Inff [g], (6) 

i=l 

where pi(x,y) — \x — y\ and Cov[/, g] — 
~E[f(x) g(x)} - E[f(x)]E[g(x)] is the covariance of / 
and g. With the usual definitions of x, y, u[t] and s, we 
have f(x) = f[u(0)] and u[s] = y. Hence, we may write 

Cov[f,g] = E[f(u[0})g(u[0]) - f(x)g(u[s})] 
= E[f(x)(g(u[0])-g(u[s}))] < E[\g(u[0])-g(u[s})\] , 

and the proof of proceeds as above. 

3.4 When d is not a metric. 

In this section we generalize our results to the case when 
/ maps into (Z, p), where [Z, p) is a "semimetric". This 
just means that p need not satisfy the triangle inequality; 
specifically, all we require of p is that p > 0, p(z, z) = 0, 
and p(z, z') — p(z' , z). (Again we do not insist that 
p(z, z') = => z = z'.) Our main motivation for study- 
ing this extension is the case Z = R with p = pi{z^ z') := 



(z - z') 2 /2. In this case Vr P2 [/] = Var [/] and Inf P2 (/) 
has the meaning commonly associated with this notation 
for functions / : Q — > R; that is, the interpretation used 
in, e.g., the Efron-Stein inequality or in 1171 . 

To study the semimetric case, we simply introduce a 
quantity measuring the extent to which the triangle in- 
equality fails for p on paths of length k. We define 
the defect of a sequence zo, z\, ■ ■ ■ , Zk G Z k+1 to be 
p(z ,z k ) I (52 t =xP{zt-i,Zt)), where § is taken to be 
1. We then define the k-defect of p, denoted Deffe(p), to 
be the maximum defect of any sequence zq, . . . , Zk- The 
following facts are easy to check: 

• Defi(p) = 1 and Deffc(p) is nondecreasing with k. 

• Deffc(p) < (sup p) /(inf p) for all k. 

• Def 2 (p) = 1 implies that p satisfies the triangle in- 
equality, which, in turn, implies that Deffc(p) = 1 
for all k; i.e., p is a metric. 

• If p x l q is a metric for some q > 1, then Deffc(p) < 

Thus in our motivating case with Z = R and 
p(z, z') = {z- z') 2 /2 we have Def fc (p) < k. 

• If p x / q is a metric for some q > 1, then Deffc(p) < 
IZI*- 1 for all k. 

It is easy to see how to generalize Theorems 13 . 1 1 and 13 .21 
for semimetrics p; since Theorem l3.2l is more general, we 
will only state its extension: 

Theorem 3.3 Let f^g : Q — » (Z,p) be functions map- 
ping an n-wise product probability space into a semimet- 
ric space, and let T be an RDT computing f. Let k be 
the length of the longest path in any DDT in T's support. 
Then 

n 

|CoVr[/,g]|<Def fc (p) J>(T) Inf, ; (. 9 ). 

i=l 

This is the most general version of our main inequality 
that we state. In the semimetric setting we are most inter- 
ested in, namely that of one function / : Q — ► (R, p-i), we 
have the following: 

Corollary 3.4 Let f ; CI — * (R, p%) be a function map- 
ping an n-wise product probability space into the real line 
with semimetric pi(z, z') = (z — z 1 ) 2 j% and let T be an 
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RDT computing f. Let k be the length of the longest path 
in any DDT in T's support. Then 

n 

Var[/]<fc 5>(T) Inff (/), 
i=i 

and f has a coordinate with pi-influence at least 
Var[/]/fc 2 (since £? =1 S t {T) < k). 

3.5 Tightness of the inequality 

Our main Theorem 13. II can be tight; one class of DDTs 
for which it is tight are read-once decision trees. In fact, 
it is tight for a broader family of decision trees, which we 
now describe. Observe that each subtree of a decision tree 
below any given node can be thought of as a decision tree 
on the same input O (which may ignore some of the input 
variables). Say that a decision tree is separated, if for 
every two subtrees T' and T" and every input x G f2, if 
T' and T" compute different values on x, then the sets of 
variables they query on input x are disjoint. Clearly, read- 
once trees are separated. Later, we will see that separated 
trees are not necessarily read-once. 

To prove that Theorem 13.11 is tight for every sepa- 
rated tree, note that the only inequality in the proof of 
the theorem is @. Suppose that T is separated, that 
f(u[0]) f(u[i\) and that t is minimal with this prop- 
erty. Since f(u[i\) ^ f(u[t — 1]), on input u[t] the vari- 
able yi t is inspected by T. Let v' be the node of T arrived 
at right after reading xi t on input x, and let v" be the node 
of T arrived at right after reading y H on input u[t]. Then 
on input u[t] the two subtrees of T rooted at v' and v" 
calculate f(x) — f(u[0]) and f(u[i\), respectively. Since 
f(u[t]) ^ f(u[0]) and T is separated, the sets of variables 
examined by these two subtrees on input u[t] are disjoint. 
In particular, f(u[t]) = f(u[t + 1]) = • • • = f(u[s]). 
Since f(u[0]) = f(u[l}) = ■■■ = f(u[t - 1]), this shows 
that (0J must hold as an equality when T is separated. 
Thus, the inequality in Theorem 13 . 1 1 holds as an equality 
in this case. Note that this argument shows that equality 
holds even if d = p is just a semimetric. 

Simple examples of read-once DDTs are those for 
AND : {-1,1}™ -> {-1,1} and OR : {-1,1}" -> 
{—1,1}. The simplest nontrivial balanced example is the 
"selection function" SEL : {-1, l} 3 -> {-1, 1}, which 
maps (xi, X2,xs) to X2 if x% — 1, or x$ if x\ = —1. To 



describe a collections of trees that are separated but not 
read-once, we consider (disjoint) compositions. A dis- 
joint composition is a function F = • • • , f m ) where 
each fj acts on a disjoint set of input variables, and the 
value of each of the input variables Xj of / is the value of 
fj. An example is given by Tribes (OR of disjoint ANDs). 
It should be clear that a representation of a function as a 
disjoint composition F = f(f%, . . . , f m ) together with 
a DDT for each factor function f,fi,...,f m induces a 
DDT for the composition; one just needs to replace each 
node of the tree computing / by a corresponding tree 
computing a function fj. It is not too hard to check that 
if each of the original trees is separated, then also the tree 
calculating F(fi, . . . , f m ) is separated. In particular, re- 
cursive disjoint compositions of read-once trees are sepa- 
rated. On the other hand, it is easy to see that the simplest 
nontrivial Tribes function (xi A X2) V (^3 AK4) cannot be 
represented by a read-once tree. 

Finally, we discuss the necessity of the factor k in 
Corollarv l3.4l Indeed, as far as we know, it may be possi- 
ble to replace the factor k by an absolute constant. How- 
ever we can show that the factor k cannot be replaced 
by 1. The { — 1,1} 3 — * (K, pf) example shown in Fig- 
ure 1 demonstrates that a constant slightly greater than 1 
is necessary. Except for optimizing the leaf labels in this 
particular tree, this is the worst example we know. 

4 Questions for Future Work 

• Is it possible to explain the "coincidence" that our 
near-tight lower bound on A(/) for monotone tran- 
sitive functions gives a lower bound for graph prop- 
erties — about v 4 / 3 — that essentially matches 
the lower bound barrier that has stood since Ha- 
jnal '91 1131 ? Perhaps either the Hajnal or the 
Chakrabarti-Khot |9| arguments can be reframed 
in terms of merely transitive functions (if true of 
Chakrabarti-Khot, this would be quite interesting); 
or, perhaps graph-theoretic arguments can augment 
our elementary probabilistic reasoning to produce a 
better lower bound. 

• Can our inequality in the real-valued, pi case — 
Corollary 13. 41 — be sharpened? If the factor k could 
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Figure 1: Left edges correspond to input variables with value — 1, right edges to value 1. The function 

/ : {-1,1} 3 -» K computed by this DDT has Var[/] = §, but (5 1 (T),S 2 (T),S 3 {T)) = (1, §, §) and 
(Inff (/),Inff (/),Inff (/)) = (§, |, |), where p 2 (z, y) = (a; - y) 2 /2, so £? =1 *(r) Inff (/) = f| < §. 



be replaced by a universal constant, this would be a 
very strong variant of the Efron-Stein inequality. 

• What other applications might our main inequality 
have? We suggest there might be applications in 
computational learning theory or in the theory of ran- 
dom graphs. 
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